# Start by importing necessary packages
import requests
import pandas as pd
from IPython.display import display
from io import StringIO
from dstapi import DstApi # The helper class
Using DST’s API from python
I am a huge fan of Denmark Statistics. Their Statistikbanken contains a wealth of data on the Danish society, economy, and population.
Not only all these data are publicy available, but DST has for years also provided access to all their published data tables through an API, documented here. The API access makes it extremely easy to access and use data. Yet unless one has already some experience in accessing APIs, using it might be complex for an occasional student or analyst.
This notebook provides a quick guide on how to access data from DST’s Statsbanken through their API, and presents a utility class I wrote to more easily access data tables for analytical purposes.
The only explicit dependency of that utility is pandas
, which is anyway an extremely widespread package.
Both notebook and class can be found at this GitHub repository.
The utility can be installed by
pip install git+https://github.com/alemartinello/dstapi.git
DST’s API has four separate function calls to programmatically navigate around the published tables. This guide assumes that the analysist has scouted Statistikbanken already, and has identified the one or two tables from which data should be extracted. For these purposes, we only need two function calls: tableinfo
and data
.
data
, as the name implies, is the function that allows us to get data from a specific tabletableinfo
provides all the necessary metadata of a specific table. This latter function is crucial and extremely useful, as it allows to know how a table is structured, and which filters to pass to adata
call.
The standard process is to begin by obtaining the necessary information from tableinfo
, and then construct the call to pass to data
.
This guide will proceed by for each step of the process first showing how to do it by directly using requests
(and pandas
), and second showing how the utility class DstApi
can facilitate the process.
Step 1: Understand what a table has to offer and how it is structured
Our primary example will be DST’s table METROX1, which reports an index measuring the weekly amount of passengers travelling by metro in Copenhagen. This index was developed to measure the population’s response to the COVID pandemic. The table is small and simple, allowing for quick experimentation.
The hard way
As we know the table’s name/id we can start by accessing the API directly through the python package requests
, and ask about the table’s metadata (tableinfo
).
An API call is composed by a main web address, a function call, and a set of parameters. The main web address is https://api.statbank.dk/v1
. The function call in this case is tableinfo
. The set of necessary parameters, per documentation, is the id
of the table and the format
in which we’d like to receive the information. We’ll pick "metrox1"
for the first (note that the table-id parameter is case-sensitive), and "JSON"
for the second.
The API at DST can be called through both requests.get()
and requests.post()
. DST’s documentation recommends using post
, because as the number and complexity of parameters grows (with some of them containing non-standard Danish characters) it’s harder to embed them in an URL. However, as the call to tableinfo
is simple, below I provide examples of using both methods.
Note that the .json()
method of the request.Response
object serves to return the response content (which we requested in JSON format) rather than the object itself. That’s just to print out the output in the notebook.
This function returns a wealth of information. Not just the table id and description, but also the contact of the statistics responsible, and, crucially, names and values of the variables defining the table. In this case SÆSON
and Tid
.
The code below shows how to get the table’s metadata, and prints the beginning of the JSON
file returned.
Show the code
# Directly embed parameters in the URL with response.get()
'https://api.statbank.dk/v1' + '/tableinfo' + "?id=metrox1&format=JSON").json()
requests.get(
# Pass a dictionary of parameters to requests.get()
= {'id': 'metrox1', 'format': 'JSON'}
params 'https://api.statbank.dk/v1' + '/tableinfo', params=params).json()
requests.get(
# Use response.post() - note the change in the name of the parameter about the table's name
# I'm also adding here a language parameter - most tables are available in both Danish and English
= {'table': 'metrox1', 'format': 'JSON', 'lang':'en'}
params = requests.post(
table_metadata 'https://api.statbank.dk/v1' + '/tableinfo', json=params
).json()print(str(table_metadata).replace(',', ',\n')[0:500] + '\n...')
{'id': 'METROX1',
'text': 'Workday passenger index in the Copenhagen Metro (experimental statistics)',
'description': 'Workday passenger index in the Copenhagen Metro (experimental statistics) by seasonal adjustment and time',
'unit': 'Index',
'suppressedDataValue': '0',
'updated': '2022-06-16T08:00:00',
'active': False,
'contacts': [{'name': 'Peter Ottosen',
'phone': '+4530429191',
'mail': 'pot@dst.dk'}],
'documentation': None,
'footnote': {'text': 'Data are indexed against an averag
...
This wealth of information is already fantastic. In that metadata there’s pretty much anything you need to figure out if you can actually use the table, and eventually how you want to select the data (seasonally ajusted? For 2020 only?). Yet that JSON file might be tough to digest, especially for more complex tables. Those cases might require preprocessing and a different type of visualization. That’s where the DstApi
helper class comes into play.
The easy way
DstApi
has two methods for examining metadata.
The first one, tablesummary
, summarizes the main metadata information: * The id and description of the table * The last update time * A table with the main available cuts of the data. Each row corresponds to a variable against which we can select, with examples of variable values and labels
Show the code
# Initialize the class with the target table
= DstApi('METROX1')
metro
# Get the table summary
='en') metro.tablesummary(language
Table METROX1: Workday passenger index in the Copenhagen Metro (experimental statistics) by seasonal adjustment and time
Last update: 2022-06-16T08:00:00
variable name | # values | First value | First value label | Last value | Last value label | Time variable | |
---|---|---|---|---|---|---|---|
0 | SÆSON | 2 | 10 | Seasonally adjusted | 11 | Non-seasonally adjusted | False |
1 | Tid | 122 | 2020U01 | 2020U01 | 2022U23 | 2022U23 | True |
The second method variable_levels
zooms into a specific variable and returns a dataframe for each potential variable value. For example, we could check each value of SÆSON
Show the code
'SÆSON', language='en') metro.variable_levels(
id | text | |
---|---|---|
0 | 10 | Seasonally adjusted |
1 | 11 | Non-seasonally adjusted |
Now, we already knew these values for this simple table from tablesummary()
, as they are only two. But for more complex tables, this method is very handy. Take for example DNVPDKR2, a table showing the circulating amount of mortgage bonds issued by Danish mortgage institutes.
If I wanted for example to extract only data about fixed interest rate, convertible bonds it would be hard to know I should be referring to the value FK
in advance.
But I can use
- method
.tablesummary()
to see which variables you can select on - method
.variable_levels()
to see which values are available for each variable. Here*
is a wildcard that selects all available values for the variable.
Show the code
= DstApi('DNVPDKR2')
dnrk ='en') dnrk.tablesummary(language
Table DNVPDKR2: Danish mortgage bonds by type of mortgage bond, original maturity, remaining maturity, coupon (nominal interest rate), currency, issuer, investor sector, covered bonds, data type and time
Last update: 2024-11-28T08:00:00
variable name | # values | First value | First value label | Last value | Last value label | Time variable | |
---|---|---|---|---|---|---|---|
0 | TYPREAL | 9 | A0 | All mortgage bonds | O | 1.6 Other mortgage bonds | False |
1 | LØBETID3 | 7 | A0 | All original maturities | 6 | Other (unspecified) | False |
2 | LØBETID2 | 9 | A0 | All remaining maturities | 8 | Other (unspecified) | False |
3 | KUPON2 | 15 | A0 | All coupons | N | Other coupons | False |
4 | VALUTA | 6 | A0 | All currencies | O | Other | False |
5 | UDSTED | 10 | A0 | All issuers | O | Other issuers | False |
6 | INVSEKTOR | 10 | A0 | All sectors | U | 2. Foreign (S.2) | False |
7 | DAEKOBL | 4 | A0 | All mortgage bonds | RO | RO (other mortgage bonds) | False |
8 | DATAT | 5 | N1 | Stock - Nominal | B3 | Value adjustments - Market value | False |
9 | Tid | 299 | 1999M12 | 1999M12 | 2024M10 | 2024M10 | True |
Show the code
'TYPREAL', language='en') dnrk.variable_levels(
id | text | |
---|---|---|
0 | A0 | All mortgage bonds |
1 | FK | 1.1 Fixed rate convertible bonds |
2 | FKU | - - 1.1.1 Open for issue - fixed rate converti... |
3 | FKE | - - 1.1.2 No longer open for issue - fixed rat... |
4 | I | 1.2 Indexed bonds |
5 | RTL | 1.3 Adjustable rate bonds (RTL bonds) |
6 | V | 1.4 Bonds with a rererence rate (without inter... |
7 | VR | 1.5 Bonds with a reference rate (with interest... |
8 | O | 1.6 Other mortgage bonds |
Step 2: Get the data you need
The first step is essential for designing this second step. First and foremost because we need that information to design the call to data
. Second, to make sure we only get out the data we need. Asking for too much data only to then having to throw half of it out locally is wasteful, and ultimately disrespectful for the resources invested into allowing anyone to fire up an API call (I mean how amazing is that?).
The hard way
As for the first step, we’ll start by doing it manually. Here I’ll rely exclusively on request.post()
as recommended by DST.
To select the query parameters to pass to the data
function appropriatedly one ought to have a careful look at the DATA section in the documentation. Nonetheless, hopefully the examples below will serve to clarify how to construct such calls.
The first two key parameters are, as before, the table name and the format in which we’d like to obtain the data. In the examples below I choose BULK
, which has the advantage of being faster and allowing an unlimited number of data rows at export. There are some limitations with this format, such as the inability to perform simple computations (e.g. sums) on the fly. If you need these utilities, you probably don’t need this guide, so I’ll stick with BULK
here.
The third crucial parameter is the selection based on the variables shown in e.g. DstApi.tablesummary()
. These are mandatory: we need to specify the selection we want to do. We might however choose to include a range of possible values, or all of them, in a selection. In this case, the character *
acts as a joker. So to select all values of a variable, we can use *
. To select all 2020 weeks in Tid
, we could use 2020*
.
Below I write the parameters necessary to download the seasonally adjusted (code 10) index for all weeks in the data, and pass them to requests.post()
. Finally I print the first 200 characters of the data we received back (in ;
-separated format).
= {
params 'table': 'metrox1',
'format': 'BULK',
'variables': [
'code': 'SÆSON', 'values': ['10']},
{'code': 'Tid', 'values': ['*']}
{
]
}= requests.post('https://api.statbank.dk/v1' + '/data', json=params)
r print(r.text[:200])
SÆSON;TID;INDHOLD
Sæsonkorrigeret;2020U01;37,7
Sæsonkorrigeret;2020U08;105,0
Sæsonkorrigeret;2020U09;95,2
Sæsonkorrigeret;2020U10;93,0
Sæsonkorrigeret;2020U11;63,0
Sæsonkorrigeret;2020U12;17,9
Neat! We can then save this data to a csv file or whatever, or directly import it into pandas:
Show the code
=';').head() pd.read_table(StringIO(r.text), sep
SÆSON | TID | INDHOLD | |
---|---|---|---|
0 | Sæsonkorrigeret | 2020U01 | 37,7 |
1 | Sæsonkorrigeret | 2020U08 | 105,0 |
2 | Sæsonkorrigeret | 2020U09 | 95,2 |
3 | Sæsonkorrigeret | 2020U10 | 93,0 |
4 | Sæsonkorrigeret | 2020U11 | 63,0 |
Keep in mind that you can also specify intervals for time variables, as in the example below, where I also require the data to be exported in English.
Show the code
= {
params 'table': 'metrox1',
'format': 'BULK',
'lang': 'en',
'variables': [
'code': 'SÆSON', 'values': ['11']},
{'code': 'Tid', 'values': ['>2020U45<=2020U52']}
{
]
}= pd.read_csv(
df
StringIO('https://api.statbank.dk/v1' + '/data', json=params).text
requests.post(=';'
), sep
) df.head()
SÆSON | TID | INDHOLD | |
---|---|---|---|
0 | Non-seasonally adjusted | 2020U46 | 56.2 |
1 | Non-seasonally adjusted | 2020U47 | 55.5 |
2 | Non-seasonally adjusted | 2020U48 | 58.3 |
3 | Non-seasonally adjusted | 2020U49 | 57.6 |
4 | Non-seasonally adjusted | 2020U50 | 46.9 |
The easy-er way
The code above is already quite compact, but to avoid remembering how to import the data into pandas all the time, DstApi
has a method to import the data directly into pandas given a parameter dictionary. So, for example, given the params
dictionary defined above, we might call directly
Show the code
=params) metro.get_data(params
SÆSON | TID | INDHOLD | |
---|---|---|---|
0 | Sæsonkorrigeret | 2020U01 | 37.7 |
1 | Sæsonkorrigeret | 2020U08 | 105.0 |
2 | Sæsonkorrigeret | 2020U09 | 95.2 |
3 | Sæsonkorrigeret | 2020U10 | 93.0 |
4 | Sæsonkorrigeret | 2020U11 | 63.0 |
... | ... | ... | ... |
117 | Sæsonkorrigeret | 2022U19 | 99.5 |
118 | Sæsonkorrigeret | 2022U20 | 95.7 |
119 | Sæsonkorrigeret | 2022U21 | 103.1 |
120 | Sæsonkorrigeret | 2022U22 | 108.4 |
121 | Sæsonkorrigeret | 2022U23 | 109.3 |
122 rows × 3 columns
the .get_data()
method has also the built-in option of downloading an entire data table by not passing any parameter dictionary. As mentioned above, this might be (very) wasteful. Some DST tables contain billions of data points. That’s why when used in this way the method asks for explicit confirmation before proceeding.
However, creating the params
dictionary itself can be challenging. As we have seen above with table DNVPDKR2, table structures can be complex, and creating the parameter dictionary manually can be cumbersome.
That’s why DstApi
has a helper method returning a base dictionary of parameters with default values.
params
to .get_data()
if you know what you are doing
Some tables in Statistikbanken have millions of records. Downloading them all through the api can take a lot of time, and it’s extremely wasteful if in fact you only need a fraction of the data.
Show the code
# Start by constructing a basic dictionary
= DstApi('DNVPDKR2')
dnrk = dnrk.define_base_params(language = 'en')
params params
{'table': 'dnvpdkr2',
'format': 'BULK',
'lang': 'en',
'variables': [{'code': 'TYPREAL', 'values': ['*']},
{'code': 'LØBETID3', 'values': ['*']},
{'code': 'LØBETID2', 'values': ['*']},
{'code': 'KUPON2', 'values': ['*']},
{'code': 'VALUTA', 'values': ['*']},
{'code': 'UDSTED', 'values': ['*']},
{'code': 'INVSEKTOR', 'values': ['*']},
{'code': 'DAEKOBL', 'values': ['*']},
{'code': 'DATAT', 'values': ['*']},
{'code': 'Tid', 'values': ['*']}]}
Once I have the basic structure, I can copy-paste the dictionary definition and use the method variable_levels
to specify the data selection further. For example, I would like to have only bonds issued by Realkredit Danmark, so the code below tells me to use value RD
for variable DAEKOBL
.
Show the code
'LØBETID3', language='en') dnrk.variable_levels(
id | text | |
---|---|---|
0 | A0 | Alle oprindelige løbetider |
1 | 1 | Under 10 år |
2 | 2 | 10-årige |
3 | 3 | 15-årige |
4 | 4 | 20-årige |
5 | 5 | 30-årige |
6 | 6 | Andet (uspecificeret) |
I can further refine my query filling in the selection parameters required and call the get_data()
method to extract the final dataframe.
Show the code
= {'table': 'dnvpdkr2',
params 'format': 'BULK',
'lang': 'en',
'variables': [{'code': 'TYPREAL', 'values': ['FK']},
'code': 'LØBETID3', 'values': ['5']},
{'code': 'LØBETID2', 'values': ['A0']},
{'code': 'KUPON2', 'values': ['A0']},
{'code': 'VALUTA', 'values': ['DKK']},
{'code': 'UDSTED', 'values': ['RD']},
{'code': 'INVSEKTOR', 'values': ['A0']},
{'code': 'DAEKOBL', 'values': ['A0']},
{'code': 'DATAT', 'values': ['N1']},
{'code': 'Tid', 'values': ['*']}]}
{= dnrk.get_data(params=params, language='en')
df "TID", "INDHOLD"]] df.tail()[[
TID | INDHOLD | |
---|---|---|
294 | 2024M06 | 229104 |
295 | 2024M07 | 228410 |
296 | 2024M08 | 229411 |
297 | 2024M09 | 231149 |
298 | 2024M10 | 231694 |
And just like that, I have the full time series of RD’s 30yo fixed interest rate bonds in nominal values.
You can play around with parameters in various ways. For example, here I select a range of weeks in 2020 from the metro
table.
Show the code
# Start by constructing a basic dictionary
= metro._define_base_params(language = 'en')
params 'variables'][0]['values'] = ['10']
params['variables'][1]['values'] = ['>2020U45<=2020U52']
params[=params) metro.get_data(params
SÆSON | TID | INDHOLD | |
---|---|---|---|
0 | Seasonally adjusted | 2020U46 | 59.0 |
1 | Seasonally adjusted | 2020U47 | 54.0 |
2 | Seasonally adjusted | 2020U48 | 56.2 |
3 | Seasonally adjusted | 2020U49 | 54.5 |
4 | Seasonally adjusted | 2020U50 | 44.3 |
5 | Seasonally adjusted | 2020U51 | 40.2 |
6 | Seasonally adjusted | 2020U52 | 44.8 |
And that’s it!
I hope this guide was useful, and that the DstApi
class can prove as helpful to you as it is for me.
Once again, let me conclude with a shout out to Denmark Statistics, a real national treasure. Thanks for all your work in gathering, organizing, and publishing data for everyone to use. It’s a fantastic service, and one for which you’ll never be thanked enough.